Bioinformatics A Practical Guide to Next Generation Sequencing Data Analysis (Hamid D. Ismail)

Chromatin Immunoprecipitation Sequencing ◾ 245

programs will create a control dataset by shuffling each of the sequences in the primary

input dataset.

The following DREME command will search for motif in the FASTA sequences of the

three ChIP-Seq samples. However, because the process may take a long time and this is just

a practice, you can run the command for a single sample only to save time. Run the com-

mands from inside “motifs” directories, where FASTA files are found.

dreme -verbosity 2 \

-oc dreme_motifs_chip1 \

-dna \

-p chip1_peaks.fasta \

-t 14400 \

-e 0.05

dreme -verbosity 2 \

-oc dreme_motifs_chip2 \

-dna \

-p chip2_peaks.fasta \

-t 14400 \

-e 0.05

dreme -verbosity 2 \

-oc dreme_motifs_chip3 \

-dna \

-p chip3_peaks.fasta \

-t 14400 \

-e 0.05

The “-oc” specifies the output directory, “-dna” specifies the type of sequence, “-p” specifies

the primary dataset, “-t” specifies an elapsed time as a stopping criterion, and “-e” specifies

the E-value threshold.

The output files will be saved in the directories “dreme_motifs_chip”. The motifs are

reported in an HTML file, an XML file, and a text file. You can open each of these files by

using the right program. You can change into each of the output directory and display the

HTML file using Firefox as follows:

firefox dreme.html

Figure 6.19 shows the motifs as displayed on the HTML file. The figure shows motif

sequence, logo, RC logo (reverse complement logo), and E-value. The motif sequence logo

is a graphical representation of the sequence conservation of DNA nucleotides. A DNA

sequence logo consists of the four nucleobase letters A, C, G, and T at each position. The

relative sizes of the letters reflect their frequency in the aligned sequences. The sequence

of the motif uses the IUPAC codes for nucleotides for representing each of the 15 possible

combinations as shown in Table 6.2.